Computational analysis of bacterial RNA-Seq data
نویسندگان
چکیده
Recent advances in high-throughput RNA sequencing (RNA-seq) have enabled tremendous leaps forward in our understanding of bacterial transcriptomes. However, computational methods for analysis of bacterial transcriptome data have not kept pace with the large and growing data sets generated by RNA-seq technology. Here, we present new algorithms, specific to bacterial gene structures and transcriptomes, for analysis of RNA-seq data. The algorithms are implemented in an open source software system called Rockhopper that supports various stages of bacterial RNA-seq data analysis, including aligning sequencing reads to a genome, constructing transcriptome maps, quantifying transcript abundance, testing for differential gene expression, determining operon structures and visualizing results. We demonstrate the performance of Rockhopper using 2.1 billion sequenced reads from 75 RNA-seq experiments conducted with Escherichia coli, Neisseria gonorrhoeae, Salmonella enterica, Streptococcus pyogenes and Xenorhabdus nematophila. We find that the transcriptome maps generated by our algorithms are highly accurate when compared with focused experimental data from E. coli and N. gonorrhoeae, and we validate our system's ability to identify novel small RNAs, operons and transcription start sites. Our results suggest that Rockhopper can be used for efficient and accurate analysis of bacterial RNA-seq data, and that it can aid with elucidation of bacterial transcriptomes.
منابع مشابه
Detection of Bacterial Small Transcripts from RNA-Seq Data: A Comparative Assessment
Small non-coding RNAs (sRNAs) are regulatory RNA molecules that have been identified in a multitude of bacterial species and shown to control numerous cellular processes through various regulatory mechanisms. In the last decade, next generation RNA sequencing (RNA-seq) has been used for the genome-wide detection of bacterial sRNAs. Here we describe sRNA-Detect, a novel approach to identify expr...
متن کاملTSSer: an automated method to identify transcription start sites in prokaryotic genomes from differential RNA sequencing data
MOTIVATION Accurate identification of transcription start sites (TSSs) is an essential step in the analysis of transcription regulatory networks. In higher eukaryotes, the capped analysis of gene expression technology enabled comprehensive annotation of TSSs in genomes such as those of mice and humans. In bacteria, an equivalent approach, termed differential RNA sequencing (dRNA-seq), has recen...
متن کاملIdentification of sRNAs expressed by the human pathogen Neisseria gonorrhoeae under disparate growth conditions
In the last several years, bacterial gene regulation via small RNAs (sRNAs) has been recognized as an important mechanism controlling expression of essential proteins that are critical to bacterial growth and metabolism. Technologies such as RNA-seq are rapidly expanding the field of sRNAs and are enabling a global view of the "sRNAome" of several bacterial species. While numerous sRNAs have be...
متن کاملClustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملResolving host–pathogen interactions by dual RNA-seq
The transcriptome is a powerful proxy for the physiological state of a cell, healthy or diseased. As a result, transcriptome analysis has become a key tool in understanding the molecular changes that accompany bacterial infections of eukaryotic cells. Until recently, such transcriptomic studies have been technically limited to analyzing mRNA expression changes in either the bacterial pathogen o...
متن کامل